10 research outputs found
A wide-spectrum language for verification of programs on weak memory models
Modern processors deploy a variety of weak memory models, which for
efficiency reasons may (appear to) execute instructions in an order different
to that specified by the program text. The consequences of instruction
reordering can be complex and subtle, and can impact on ensuring correctness.
Previous work on the semantics of weak memory models has focussed on the
behaviour of assembler-level programs. In this paper we utilise that work to
extract some general principles underlying instruction reordering, and apply
those principles to a wide-spectrum language encompassing abstract data types
as well as low-level assembler code. The goal is to support reasoning about
implementations of data structures for modern processors with respect to an
abstract specification.
Specifically, we define an operational semantics, from which we derive some
properties of program refinement, and encode the semantics in the rewriting
engine Maude as a model-checking tool. The tool is used to validate the
semantics against the behaviour of a set of litmus tests (small assembler
programs) run on hardware, and also to model check implementations of data
structures from the literature against their abstract specifications
Modelling the ARMv8 architecture, operationally: Concurrency and ISA
Copyright is held by the owner/author(s). In this paper we develop semantics for key aspects of the ARMv8 multiprocessor architecture: the concurrency model and much of the 64-bit application-level instruction set (ISA). Our goal is to clarify what the range of architecturally allowable behaviour is, and thereby to support future work on formal verification, analysis, and testing of concurrent ARM software and hardware. Establishing such models with high confidence is intrinsically difficult: it involves capturing the vendor's architectural intent, aspects of which (especially for concurrency) have not previously been precisely defined. We therefore first develop a concurrency model with a microarchitectural flavour, abstracting from many hardware implementation concerns but still close to hardware-designer intuition. This means it can be discussed in detail with ARM architects. We then develop a more abstract model, better suited for use as an architectural specification, which we prove sound w.r.t. the first. The instruction semantics involves further difficulties, handling the mass of detail and the subtle intensional information required to interface to the concurrency model. We have a novel ISA description language, with a lightweight dependent type system, letting us do both with a rather direct representation of the ARM reference manual instruction descriptions. We build a tool from the combined semantics that lets one explore, either interactively or exhaustively, the full range of architecturally allowed behaviour, for litmus tests and (small) ELF executables. We prove correctness of some optimisations needed for tool performance. We validate the models by discussion with ARM staff, and by comparison against ARM hardware behaviour, for ISA single-instruction tests and concurrent litmus tests.This work was partly funded
by the EPSRC Programme Grant REMS: Rigorous Engineering for
Mainstream Systems, EP/K008528/1, the Scottish Funding Council (SICSA Early Career Industry Fellowship, Sarkar), an ARM
iCASE award (Pulte), and ANR grant WMC (ANR-11-JS02-011,
Maranget)
Mixed-size concurrency: ARM, POWER, C/C++11, and SC
Previous work on the semantics of relaxed shared-memory concurrency has only considered the case in which each load reads the data of exactly one store. In practice, however, multiprocessors support mixed-size accesses, and these are used by systems software and (to some degree) exposed at the C/C++ language level. A semantic foundation for software, therefore, has to address them.
We investigate the mixed-size behaviour of ARMv8 and IBM POWER architectures and implementations: by experiment, by developing semantic models, by testing the correspondence between these, and by discussion with ARM and IBM staff. This turns out to be surprisingly subtle, and on the way we have to revisit the fundamental concepts of coherence and sequential consistency, which change in this setting. In particular, we show that adding a memory barrier between each instruction does not restore sequential consistency. We go on to extend the C/C++11 model to support nonatomic mixed-size memory accesses, and prove the standard compilation scheme from C11 atomics to POWER remains sound.
This is a necessary step towards semantics for real-world shared-memory concurrent code, beyond litmus tests
Recommended from our members
Simplifying ARM concurrency: Multicopy-atomic axiomatic and operational models for ARMv8
ARM has a relaxed memory model, previously specified in informal prose for ARMv7 and ARMv8. Over time, and partly due to work building formal semantics for ARM concurrency, it has become clear that some of the complexity of the model is not justified by the potential benefits. In particular, the model was originally
non-multicopy-atomic
: writes could become visible to some other threads before becoming visible to all — but this has not been exploited in production implementations, the corresponding potential hardware optimisations are thought to have insufficient benefits in the ARM context, and it gives rise to subtle complications when combined with other ARMv8 features. The ARMv8 architecture has therefore been revised: it now has a multicopy-atomic model. It has also been simplified in other respects, including more straightforward notions of dependency, and the architecture now includes a formal concurrency model.
In this paper we detail these changes and discuss their motivation. We define two formal concurrency models: an operational one, simplifying the Flowing model of Flur et al., and the axiomatic model of the revised ARMv8 specification. The models were developed by an academic group and by ARM staff, respectively, and this extended collaboration partly motivated the above changes. We prove the equivalence of the two models. The operational model is integrated into an executable exploration tool with new web interface, demonstrated by exhaustively checking the possible behaviours of a loop-unrolled version of a Linux kernel lock implementation, a previously known bug due to unprevented speculation, and a fixed version.</jats:p
Mixed-size concurrency: ARM, POWER, C/C++11, and SC
Previous work on the semantics of relaxed shared-memory concurrency has only considered the case in which each load reads the data of exactly one store. In practice, however, multiprocessors support mixed-size accesses, and these are used by systems software and (to some degree) exposed at the C/C++ language level. A semantic foundation for software, therefore, has to address them.
We investigate the mixed-size behaviour of ARMv8 and IBM POWER architectures and implementations: by experiment, by developing semantic models, by testing the correspondence between these, and by discussion with ARM and IBM staff. This turns out to be surprisingly subtle, and on the way we have to revisit the fundamental concepts of coherence and sequential consistency, which change in this setting. In particular, we show that adding a memory barrier between each instruction does not restore sequential consistency. We go on to extend the C/C++11 model to support nonatomic mixed-size memory accesses, and prove the standard compilation scheme from C11 atomics to POWER remains sound.
This is a necessary step towards semantics for real-world shared-memory concurrent code, beyond litmus tests
Recommended from our members
Research data supporting “Mixed-size Concurrency: ARM, POWER, C/C++11, and SC”
Supplementary material for Mixed-size Concurrency: ARM, POWER, C/C++11, and SCEPSRC [EP/K008528/1, EP/M027317/1], ARM iCASE award, Gates Cambridge Scholarship, ANR grant WMC [ANR-11-JS02-011